This web report includes descriptive statistics of the Seattle 911 CAD data. The report starts with an overall summary of the structure of the dataset and then steps through each variable in the dataset.
Let’s start by identifying the dimensions in the dataset.
## [1] 752421 17
There are 752,421 events and and 17 variables in the data. The variable names from the CAD export are listed below.
## [1] "CAD_Event_ID" "Dispatch_ID"
## [3] "Event_First_Dispatch_Time_ATTR" "Call_Priority_Code"
## [5] "Call_Type_Desc" "Case_Type_Final_Desc"
## [7] "Case_Type_Initial_Desc" "Clear_By_Desc"
## [9] "Dispatch_Address" "Officer_Serial_Num"
## [11] "Precinct" "Sector"
## [13] "Squad_Desc" "Dispatch_Blurred_Latitude"
## [15] "Dispatch_Blurred_Longitude" "CAD_Event_Response_Time_Seconds_SUM"
## [17] "Total_Service_Time_Seconds_SUM"
Now, let’s find the number of categories in the categorical variables. In subsequent sections, I will step through each variable and summarize the distributions in greater detail.
## Dispatch_ID Call_Priority_Code Call_Type_Desc
## 719509 9 8
## Case_Type_Final_Desc Case_Type_Initial_Desc Clear_By_Desc
## 343 235 23
## Precinct Sector
## 6 17
Dispatch ID - This is some sort of identifier. It’s interesting that the identifiers are not unique to each event. What does the dispatch ID identify? Is this identifier going to be relevant for our analysis?
Call priority codes and Call type description have a manageable number of categories - 9 and 8, respectively. After taking a deeper dive into the univariate statistics in the sections below and understanding what these categories mean, we can decide whether any of these categories should be aggregated.
Case Type Final and Case Type Initial Descriptions - These two variables have the greatest number of categories with 343 and 235 categories, respectively. We will want to parse out the categories and see how to regroup into a smaller, more manageable set of categories for analysis. After looking over the categories we can figure out some strategies for aggregating categories.
Clear by description - There are 23 categories in this variable. After further review below, we can look to see if any aggregation is necessary.
Precinct is a categorical spatial indicator. It looks like the city is divided into 6 regional precincts.
Sector - There are 17 sectors. This variable appears to be another spatial category related to precinct. This will be described in the sector section below.
Before diving into the distributions of the categorical variables in greater detail, let’s take advantage of the fact that the data are time-stamped and get a sense of the frequency of events throughout the year.
The data are time stamped to the minute. In the graph below, I have displayed the frequency of events per day. Hover your mouse over the line graph to see the number of events that occurred on a given day.
The date with the highest number of events recorded was 2,530, which was on July 13th. In general, the summer months appear to have higher frequencies that the rest of the year.
November 14th, 2019 is the date with the most marked decrease in events. There were only 76 events recorded on November 14th. This is far below other days with fewer events than normal, as shown in Table 1 below. It raises the possibility of a glitch in the reporting system for that day.
| Date | # | Rank |
|---|---|---|
| 2019-07-13 | 2,530 | 1 |
| 2019-05-30 | 2,512 | 2 |
| 2019-05-31 | 2,509 | 3 |
| 2019-06-14 | 2,482 | 4 |
| 2019-06-01 | 2,470 | 5 |
| 2019-03-15 | 2,469 | 6 |
| 2019-05-10 | 2,468 | 7 |
| 2019-12-13 | 2,464 | 8 |
| 2019-04-26 | 2,457 | 9 |
| 2019-05-02 | 2,430 | 10 |
| 2019-12-24 | 1,571 | 356 |
| 2019-03-10 | 1,566 | 357 |
| 2019-02-09 | 1,561 | 358 |
| 2019-11-17 | 1,560 | 359 |
| 2019-11-28 | 1,501 | 360 |
| 2019-02-10 | 1,467 | 361 |
| 2019-12-25 | 1,445 | 362 |
| 2019-02-03 | 1,434 | 363 |
| 2019-11-13 | 640 | 364 |
| 2019-11-14 | 76 | 365 |
On average, there were 2,061 events per day in 2019. With the exception of the 76 event day on November 14th, there is not much skewedness in the distribution of events throughout the year.
| Daily Avg | Std. Dev | Median |
|---|---|---|
| 2,061.427 | 238.363 | 2,076 |
Code 2 is the most common priority code recorded with a total of 219,406 in 2019. According to Table 3, Code 2 is about 30% of the events in 2019. Just over three-quarters of the events are categorized as being categorized as priority codes 1 through 3.
Codes 6 and -1 have the fewest events. They do not show up as clearly in the graph, but in Table 3, shown below, they total to 44 and 274 events, respectively. I dug into the -1 code a little bit more and it looks like this code is applied to very specific cases. All 274 of these events had the call type listed as Onview and had the initial case type description of “DOWN - CHECK FOR DOWN PERSON”. You can flip through the paged table below to see the events with -1 priority codes.
One other point to note is that there is not a code 8; the codes skip from 7 to 9.
| Code | # Events | % |
|---|---|---|
| -1 | 274 | 0.04 |
| 1 | 162,377 | 21.58 |
| 2 | 219,406 | 29.16 |
| 3 | 191,722 | 25.48 |
| 4 | 15,085 | 2.00 |
| 5 | 6,793 | 0.90 |
| 6 | 44 | 0.01 |
| 7 | 131,411 | 17.47 |
| 9 | 25,309 | 3.36 |
| Type | # Events | % |
|---|---|---|
| 911 | 325,008 | 43.19 |
| ONVIEW | 245,444 | 32.62 |
| TELEPHONE OTHER, NOT 911 | 155,803 | 20.71 |
| ALARM CALL (NOT POLICE ALARM) | 25,606 | 3.40 |
| TEXT MESSAGE | 429 | 0.06 |
| PROACTIVE (OFFICER INITIATED) | 54 | 0.01 |
| SCHEDULED EVENT (RECURRING) | 66 | 0.01 |
| IN PERSON COMPLAINT | 11 | 0.00 |
911 calls are about 43% of the events. Onview and some other telephone call are the second and third most common types, and together they comprise just over 50% of the event types. Text message, officer initiated, scheduled recurring event, and in-person complaints are very minimal sources of the events.
Questions 1) Is the plan to focus solely on 911 call types? (If so, the below questions are not relevant.) 2) What is the difference between Onview and Proactive(Officer Initiated)? Is it possible that these two types would be worth combining?
Flip through the pages in the table to view the number of events with each type of case final description. Recall that this variable has 343 different descriptions.
Some of these descriptions have a general description followed by a more specific description that follows a dash. We could parse on the general description and then aggregate to get a smaller set of categories. I demonstrate this in the table below.
This aggregation strategy reduced the number of categories to 153. Traffic related cases are the most common followed by disturbance and suspicious circumstances. If you flip through the pages, there are some categories that also appear to be similar to these top 3. For instance, traffic stop is listed on page 6, which seems like it could also fit under traffic. Also on page 6 is the category suspicious stop, which seems related to suspicious circumstances. All of descriptions and frequencies for the final case type descriptions are listed in the exported Excel file.
Other Comments * Need to make sure to catch abbreviations using reg. expressions (e.g., burg –> burglary) * Similarly, use reg. expressions for categories that look alike but differ in terms of spacing (e.g., Arson, Bombs, Explo; Abandoned car & Abandoned vehicle) * “#NAME?” looks like it might be the classification for events that were not classified. There are 19,900 events with this classification, which is about 2.64 events.
The top three/four initial case type descriptions occur at about the same frequency. The top four are also in the top four in the final description, but the ordering differs.
One note on structure of these descriptions is that not as many of these descriptions have the same structure as noted in the final descriptions, that is a general description followed by a more specific description/detail, with the two descriptions separated by a dash “-”. Below, I have parsed out the description as I did with the final case descriptions, however, it may be a less useful approach for this description.
Other Comments/Questions * Need to make sure to catch abbreviations using reg. expressions (e.g., HAZ –> HAZARD) * “#NAME?” shows up again in this set of descriptions, though not as frequently as it did in the final descriptions (n=12,132). * Would it be useful to compare final and initial descriptions? We could use some fuzzy matching and regular expressions if this is something important. If final descriptions are missing (meaning that they are coded as #NAME?) and initial descriptions are not missing, should the initial description be applied?
Aggregating reduced the number of descriptions down to 125. The top four descriptions remain the same, but the rest of the top 10 have shifted ranks (e.g., assault, trespass).
| Description | # Events | % |
|---|---|---|
| ASSISTANCE RENDERED | 286,250 | 38.04 |
| REPORT WRITTEN (NO ARREST) | 169,102 | 22.47 |
| PHYSICAL ARREST MADE | 63,198 | 8.40 |
| UNABLE TO LOCATE INCIDENT OR COMPLAINANT | 57,973 | 7.70 |
| CITATION ISSUED (CRIMINAL OR NON-CRIMINAL) | 34,242 | 4.55 |
| NO POLICE ACTION POSSIBLE OR NECESSARY | 23,461 | 3.12 |
| ORAL WARNING GIVEN | 23,282 | 3.09 |
| FALSE COMPLAINT/UNFOUNDED | 18,688 | 2.48 |
| PROBLEM SOLVING PROJECT | 17,929 | 2.38 |
| OTHER REPORT MADE | 17,389 | 2.31 |
| RESPONDING UNIT(S) CANCELLED BY RADIO | 11,537 | 1.53 |
| FOLLOW-UP REPORT MADE | 9,320 | 1.24 |
| STREET CHECK WRITTEN | 7,023 | 0.93 |
| DUPLICATED OR CANCELLED BY RADIO | 5,925 | 0.79 |
| - | 2,626 | 0.35 |
| INCIDENT LOCATED, PUBLIC ORDER RESTORED | 2,406 | 0.32 |
| RADIO BROADCAST AND CLEAR | 653 | 0.09 |
| TRANSPORTATION OR ESCORT PROVIDED | 535 | 0.07 |
| SERVICE OF DVPA ORDER | 554 | 0.07 |
| NON-CRIMINAL REFERRAL | 216 | 0.03 |
| (NOT CURRENTLY USED) ALARM NO RESPONSE | 39 | 0.01 |
| EXTRA UNIT | 54 | 0.01 |
| NO SUCH ADDRESS OR LOCATION | 19 | 0.00 |
| Precinct | # Events | % |
|---|---|---|
| WEST | 219,114 | 29.12 |
| NORTH | 190,225 | 25.28 |
| SOUTH | 133,006 | 17.68 |
| EAST | 120,004 | 15.95 |
| SOUTHWEST | 84,140 | 11.18 |
| UNKNOWN | 5,932 | 0.79 |
The western precinct had the most events in 2019, with about 29% of the events occurring in the precinct. North was the next common, comprising about 25% of the events. Southwest had the fewest number of events recorded with 84,140 events last year.
For 5,932 of the events, the precinct is unknown. We may be able to identify a precinct for these events if they have valid latitude and longitude coordinates. Let’s look to see if they do have lat and long:
| Coordinate Status | # Events |
|---|---|
| Not valid coords | 4,006 |
| Valid coords | 1,926 |
The majority of the events with unknown precincts do not have coordinates that are within the extent of Seattle/King County, Washington. We can however use 1,962 of these events with unknown precincts as they do have coordinates that fall within the geographic extent of Seattle. When I create a spatial object from the coordinates, as shown a few sections below, I will be able to plot these. For some it may be obvious what the precinct is based on the precinct labels given to neighboring events. If the precinct classification is not obvious, the best thing to do would be to obtain a shapefile of the polygons for each of the five precincts, overlay it on the events and give the point the name of the polygon precinct that it falls within. Seattle’s Open Data website has such a shapefile that I will call on and use in the spatial geoprocessing section below.
There are some interesting bivariate analyses that could be explored. For example, call priority codes and precincts. View the interactive stacked bar chart below.
A few things stand out in the stacked bar graph of call priority codes and precincts. * Just under half of the events with the specialized code -1 were in the Western precinct. * The North and West precincts had very similar shares of events in codes 1 through 3. In each of these codes, the cases in the North and West total just over 50% of the cases with that code. * Just over 40% of the events classified as code 6 are in the Northern precinct. * About 45% of code 9 events are in the Western precinct, which is similar to the share of code -1 events.
Let’s turn to focus on the sectors. There are 17 distinct sector names. 5,932 events that were not given a sector. These events are identical to those missing a precinct classification.
| Precinct | Sector | # Events | Percent |
|---|---|---|---|
| SOUTH | OCEAN | 53,050 | 39.89 |
| SOUTH | ROBERT | 42,067 | 31.63 |
| SOUTH | SAM | 37,889 | 28.49 |
| EAST | EDWARD | 57,875 | 48.23 |
| EAST | GEORGE | 31,724 | 26.44 |
| EAST | CHARLIE | 30,405 | 25.34 |
| SOUTHWEST | FRANK | 43,161 | 51.30 |
| SOUTHWEST | WILLIAM | 40,979 | 48.70 |
| WEST | KING | 76,274 | 34.81 |
| WEST | MARY | 57,260 | 26.13 |
| WEST | DAVID | 47,126 | 21.51 |
| WEST | QUEEN | 38,454 | 17.55 |
| NORTH | BOY | 45,660 | 24.00 |
| NORTH | NORA | 42,120 | 22.14 |
| NORTH | UNION | 40,548 | 21.32 |
| NORTH | LINCOLN | 33,979 | 17.86 |
| NORTH | JOHN | 27,918 | 14.68 |
| UNKNOWN | NA | 5,932 | 100.00 |
Sectors are unique to precincts. We can think of a sectors as a subdivision of the precinct. In the table above, we see that within the South precinct, Ocean sector had about 40% of the events, whereas the other two sectors - Robert and Sam - were about 30% each.
The Edward sector had nearly half of the events in the East precinct.
The Southwest precinct’s events were relatively evenly divided among the two sectors - Frank and William.
The West precinct, which has the most events of all the precincts, has a wide spread in terms of the number of events in each of its four sectors. The King sector had the most events (76,274 about 35%), and Queen sector had the fewest (38,454 about 18%).
The North precinct has 5 sectors. Boy, Nora, and Union sectors have similar shares of events within their boundaries. The other two sectors - Lincoln and John - make-up just over 30% of the events in the precinct.
The Seattle Open Data website does not appear to have a boundary shapefile or API for sector. This may be something to inquire about if we want to do point-in-polygon analyses.
This is one of the variables with an unmanageable amount of categories. There are only 2,746 events missing a squad description. If you flip through the pages of the table you can see that the squad groups are named in various ways. Some are based on the field/area they work in (e.g., forensics, Arson/Bomb) and others are based on locations (i.e., precinct + sector). If this is a variable that is considered important we would need to approach the aggregation like we would for the Case type descriptions using the first descriptor before the dash, regular expressions, and lazy matching to get broad categories and abbreviations, misspellings, and differences in ordering of words.
## [1] 1342
There are 1,342 officers in this dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 259 1483 989 14773646
Response time for each event is reported in seconds. The summary statistics suggest that there are some very long response times that are outliers. The longest response time is 14,773,646 seconds, which would be many, many days long. Let’s parse the seconds into higher levels of time.
With the times parsed into periods and sorted from longest to shortest time, we can see that the longest time was 170 days and the case was a test call. This is probably a candidate for excluding. For completeness, below the data displayed sorted from shortest to longest, so that it is easier to see what the short response times are.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -2953 504 1326 3072 3540 251587 281
The distribution for total service time on events is strange. There are 281 events missing a total service time. Additionally, there is at least one event that had a negative total service time recorded. First, let’s see how many negative values we have.
| Date | Service Time (Seconds) | response time parsed | Case Type Final |
|---|---|---|---|
| 2019-11-03 | -2,953 | 0S | TRAFFIC - MOVING VIOLATION |
| 2019-11-03 | -2,604 | 3H 41M 19S | TRAFFIC - PARKING VIOL (EXCEPT ABANDONED CAR) |
| 2019-11-03 | -1,829 | 1H 22M 0S | CRISIS COMPLAINT - GENERAL |
| 2019-11-03 | -1,829 | 1H 22M 0S | CRISIS COMPLAINT - GENERAL |
| 2019-11-03 | -1,091 | 9M 1S | DISTURBANCE - OTHER |
| 2019-11-03 | -998 | 4M 50S | ASSAULTS, OTHER |
| 2019-11-03 | -699 | 2M 9S | DISTURBANCE - OTHER |
| 2019-11-03 | -699 | 2M 9S | DISTURBANCE - OTHER |
There are only 8 events in the dataset with negative values. When we include information like the event date, parsed response time, and case description type, we notice that two of these are duplicates. The other thing that stands out is that these events were all recorded on the same date, November 3rd. It is possible that the negative values were a recording error that occurred that day. We could also check for the average service time on other events of a similar type to see if the absolute value of total service time is reasonable.
Now, let’s look at the NA values.The events with missing values vary on case types. There appears to be some duplicates, e.g., the assault-DV case on January 13th. Again, it seems like event date and response time would be useful for identifying duplicates and then de-duplicating this dataset.
For the sake of consistency, I parsed the total service time into time periods as I did with the response time. See some of the output below.
With the parsed by period version of service time, we see that the upper end of the service time distribution is 2 days. Notice that the six longest entries are missing case descriptions. There is also a duplicate among these six - the one on March 23rd. If you flip through the pages, you can spot more duplicates, which again suggests that de-duplicating on event date, time, and case type description would be most useful.
Before transforming the dataframe into a spatial object, the events with missing or invalid coordinates need to be removed. After filtering those events out, the transformed spatial object contains 675,664 locations. In total there are 76,757 events that do not have valid coordinates. Mapping all of these events as points results in over-plotting as shown below.
There are other approaches for visualizations that would be more informative. One approach is to create a point density map to show where the highest and lowest number of events per area occurred in the city. Another approach is to aggregate the points to meaningful geographic units like zipcodes or neighborhoods. The following sections demonstrate these approaches.
This interactive map clusters the points that are proximate. Zoom into different parts of the city to where clusters tend to occur.